This documents works through an analysis of 5-digit manufacturing establishments at the county-level, from 2019, using county business pattern data. Setting aside the above need for theorizing, there is a need to define a solid workflow for understanding county-level 5-digit NAICS data. Given the sheer number of combinations of county-level and sub-sector level activity, there is a potentially overwhelming amount of heterogeneity in these data. While a theoretical model would help structure our approach to these data, there are a few steps that we can take to get some preliminary, county-level trends.
The first question we can ask of our data is: what is the distribution of very large establishments (as measured by employment), across both industries and geographies? Here, we draw on both the anchor-tenant theory, as well as manufacturing specific work that emphasizes the role of very large establishments in coordinating regional manufacturing activity and supply chains.
The procedure defined here could easily be applied to granular data about GDP (in partnership with the Census perhaps), or through annual sales data from the NETS data base. Given what we know about varieties of manufacturing, there may be substantial value in contrasting employment to sales results.
The second question we can ask of our data is: how do the distribution of small and medium sized establishments by industry (and to a lesser extent, geography), compare against the large establishments in specific regions?
The third question we can ask of our data is: how concentrated or distributed are states across geographies and counties? We can create a four by four matrix here. Coupled with this question, we can also ask: how do county-level trends compare against state-level trends?
Here we define a procedure to clean and pre-process the data. The data we rely on here are from the US Census County Business Patterns at the 5-digit NAICS level, focusing on the year 2019. We have the same data from 2016-2019, and have a 2-digit all industries control for 2017 and 2019.
## To enable caching of data, set `options(tigris_use_cache = TRUE)`
## in your R script or .Rprofile.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## here() starts at /Users/Nosheal/Documents/CMU/State_Manufacturing
##
##
## Attaching package: 'gridExtra'
##
##
## The following object is masked from 'package:dplyr':
##
## combine
##
##
##
## Attaching package: 'cowplot'
##
##
## The following object is masked from 'package:lubridate':
##
## stamp
##
##
##
## Attaching package: 'janitor'
##
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
##
##
## The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
## which was just loaded, will retire in October 2023.
## Please refer to R-spatial evolution reports for details, especially
## https://r-spatial.org/r/2023/05/15/evolution4.html.
## It may be desirable to make the sf package available;
## package maintainers should consider adding sf to Suggests:.
## The sp package is now running under evolution status 2
## (status 2 uses the sf package in place of rgdal)
##
## rgeos version: 0.6-4, (SVN revision 699)
## GEOS runtime version: 3.11.0-CAPI-1.17.0
## Please note that rgeos will be retired during October 2023,
## plan transition to sf or terra functions using GEOS at your earliest convenience.
## See https://r-spatial.org/r/2023/05/15/evolution4.html for details.
## GEOS using OverlayNG
## Linking to sp version: 2.0-0
## Polygon checking: TRUE
##
##
##
## Attaching package: 'jsonlite'
##
##
## The following object is masked from 'package:purrr':
##
## flatten
##
##
## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
## Joining with `by = join_by(area_title)`
## Retrieving data for the year 2021
##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|===== | 6%
|
|===== | 7%
|
|====== | 8%
|
|====== | 9%
|
|======== | 11%
|
|======== | 12%
|
|========= | 12%
|
|========= | 13%
|
|============= | 18%
|
|================ | 22%
|
|================= | 25%
|
|==================== | 28%
|
|========================= | 35%
|
|========================== | 37%
|
|============================ | 40%
|
|================================= | 48%
|
|================================== | 48%
|
|=================================== | 49%
|
|=================================== | 50%
|
|===================================== | 53%
|
|======================================== | 57%
|
|================================================ | 68%
|
|================================================ | 69%
|
|================================================== | 72%
|
|====================================================== | 77%
|
|========================================================== | 83%
|
|=========================================================== | 84%
|
|=============================================================== | 89%
|
|=============================================================== | 90%
|
|================================================================= | 93%
|
|=================================================================== | 96%
|
|======================================================================| 100%
## Retrieving data for the year 2021
##
|
| | 0%
|
|============ | 17%
|
|======================== | 35%
|
|===================================== | 52%
|
|======================================================= | 79%
|
|======================================================================| 100%
Using existing data, we create baseline maps to build upon, as well as some helper functions to assist with mapping.
## Joining with `by = join_by(NAME)`
We start with a base map of the manufacturing share of state employment in 2019.
In addition to the above basic maps, we will define some color schemes for manufacturing at the 3-digit level. While the analysis for this section and document involves 5-digit NAICS codes, the 3-digit level offers enough variation in color to capture major industry trends.
As such, we need to define a way to reduce the NAICS digit codes of our 5-digit data (a trivial task), and crosswalk these with the 3-digit titles and colors.
## Joining with `by = join_by(area_fips)`
## Joining with `by = join_by(st)`
## Joining with `by = join_by(naics_3digit)`
With these data, our preliminary, and primary, analysis will be focused around county-level employment by subsector. Because details about establishment employment are suppressed when there are only a few establishments, we first need to create a measure for estimating the number of employees at an establishment based on the type of establishment that it is. Following Delgado et al., 2014, we use the mid-point for an establishment employee range as our estimate of the number of employees at that establishment.
## # A tibble: 10 × 4
## # Groups: EMPSZES, EMPSZES_LABEL, emp_est [10]
## EMPSZES EMPSZES_LABEL emp_est n
## <int> <chr> <dbl> <int>
## 1 1 All establishments NA 23708
## 2 210 Establishments with less than 5 employees 3 8143
## 3 220 Establishments with 5 to 9 employees 7 3626
## 4 230 Establishments with 10 to 19 employees 15 3111
## 5 241 Establishments with 20 to 49 employees 35 3067
## 6 242 Establishments with 50 to 99 employees 75 1141
## 7 251 Establishments with 100 to 249 employees 175 837
## 8 252 Establishments with 250 to 499 employees 375 136
## 9 254 Establishments with 500 to 999 employees 750 29
## 10 260 Establishments with 1,000 employees or more 1000 16
We immediately see that the distribution establishment size is heterogeneous across different sub-sectors and geographies of manufacturing activity: more industries, and more counties, see a higher density of small and medium sized manufacturing activity, while fewer sub-sectors and fewer counties are responsible for the majority of large establishment manufacturing activity.
Before we summarize these data, we define some helper functions to make repeating this process easier.
We also define a few helper functions to join our data to county-centers, and create our map.
We can combine these functions to produce a map given a condition.
As we seek to understand the distribution of establishments of different sizes across geographies and industries, will begin with the largest establishments, and steadily work our way down. To accomplish this, we will both want to sequentially plot large establishments in counties and sub-sectors that they are located within, while also creating summary graphs of the distribution of establishments across sectors. Map locations will be displayed at the 5-digit NAICS level, and summary distributions will be primarily at the 3-digit levels, though 5-digit results will be tested and examined as well. As we proceed through this work, we will try and define functions to help simply and accelerate the process.
There are extremely few establishments with over 1,000 employees. Very large establishments are concentrated across very few subsectors. The most establishments are found in: Aerospace product and parts manufacturing; navigational measuring, electromedical and control instruments manufacturing; animal slaughtering and processing, and medical equipment and supplies manufacturing.
Geographically, we see that very large establishments are concentrated in California, and especially in Los Angeles, and Salt-Lake City.
## Joining with `by = join_by(GEO_ID)`
We now turn to establishments with between 500 to 999 employees. Here, we can rely on our previously defined helper functions, to speed up our analysis.
We see that the distribution of large establishments across 5-digit sub sectors differs from the distribution of very-large establishments. There are more large establishments in animal slaughtering and processing; pharmaceutical and medicine manufacturing, and motor vehicle body and trailer manufacturing. However, the number of counties that large establishments are distributed across is only slightly higher than the number of very large establishments.
While the geographical distribution of large establishments follows some similar patterns to the geographic distribution of very large establishments, there are a few notable differences. Some new counties (and states), support large establishments (e.g North Carolina, Georgia, Alabama). Most of the locations that supported very large establishments also see the presence of large establishments, with a few exceptions (e.g Delaware, and Washington).
## Joining with `by = join_by(GEO_ID)`
Overlapping the approximate location of large establishments on top of very large establishments illustrates how large establishments located near very large establishments tend to be in related sub-sectors, again, with a few exceptions, such as the appearance of a pharmaceutical and medicine manufacturing across California, as well as the appearance of a few machinery-related sub-sectors in the los angeles, lake county (IN), and Ottawa county (MI).
## Joining with `by = join_by(GEO_ID)`
We now turn to establishments with between 250 and 499 employees. We may later combine these establishments with establishments with between 100 to 249 employees.
When considering these establishments, we see an exponential increase in both the number of counties as well as the number of sub-sectors that these establishments are distributed over. While the largest sub-sectors (in terms of number of establishments) follow similar patterns to the sub-sectors in very large and large establishments, there are a number of new sub-sectors with very few establishments that appear.
Geographically, we again see the introduction of new counties and states that support medium-large establishments (such as Oregon), as well as the introduction of new geographic centers within the same states (such as Eastern Washington), but the distribution of medium-large establishments follows quite closely with similar areas as seen with the distribution of large and very large establishments.
## Joining with `by = join_by(GEO_ID)`
Overlapping the location of these establishments with larger establishments illustrates how new geographical areas support medium-large manufacturers, as well as how the sub-sector of medium-large establishments differs (or is similar too) the sub-sector of larger establishments. For example, in Minnesota, while very-large and large establishments are in the navigational, measuring, electromedical and control instruments manufacturing, medium-large establishments in that region are in printing, pump and compressor manufacturing, and other plastics product manufacturing.
## Joining with `by = join_by(GEO_ID)`
Continuing down the list of establishment sizes, we turn to establishments with between 100 and 250 employees. We again see an exponential increase in the number of sub-sectors and geographies that medium-sized establishments are distributed over, and an even steeper skew in the density of establishments across sub-sectors, with plastic product manufacturing; printing; and paperboard container manufacturing dominating the top of the chart (a notable divergence from patterns in medium-large, large, and very large establishments).
Geographically, we again see the introduction of new counties/states that support medium sized manufacturing establishments (e.g. Colorado and New Mexico). However, the concentration of medium-sized companies around certain clusters of larger establishment activity stands out, especially in the midwest, southeast (bible belt), northwest, California, and northeast. Certain geographies, such as Pennsylvania, seem to support far more medium sized establishments larger establishments.
## Joining with `by = join_by(GEO_ID)`
Overlapping the approximate locations of medium sized establishments onto our previous maps further illustrates how certain regions support medium-sized activity without the presence of larger establishments, (e.g. Colorado, Nevada, Pennsylvania, Texas, Florida, and to a certain extent Oregon), while other regions see establishments of multiple sizes.
Crucially, this layer of our analysis suggests that there may be something fundamentally different about the structure of manufacturing activity in these regions, than manufacturing activity in regions that support establishments of multiple sizes.
## Joining with `by = join_by(GEO_ID)`
We continue down the list, now turning to establishments with 50-99 employees, and are increasingly conscious that we may need to collapse establishment size categories at this level to facilitate analysis. However, at a first pass, it is useful to continue our granular, step-by-step process.
Here, we see that the exponential increase in the distribution of establishments across sub-sectors and geographies is still greater than the previous category of establishment size, but has started to plateau slightly. Interestingly, while we see that printing and plastic manufacturing dominants the top of this graph, there are a substantial number of medium-small fabricated metal and machinery manufacturers.
## Joining with `by = join_by(GEO_ID)`
The geographical distribution of medium-small establishments continues to follow certain very interesting patterns. In certain geographies, there appears to be substantial co-location of medium-small establishments with larger establishments. However, certain geographies (such as Montana and North Dakota) support medium-sized establishments without any larger establishments, and medium-small establishments however, medium-small establishments appear to also co-locate near medium sized establishments and not just larger establishments.
## Joining with `by = join_by(GEO_ID)`
The moniker small-medium establishments may need to be adjusted, but currently refers to establishments with between 20 and 49 employees. These establishments are classified as small-medium to denote that we hypothesize that they belong to a different class of establishments. While we do not see an exponential increase in the distribution of these establishments across sectors, we certainly see an exponential increase in the distribution of these establishments across counties. In addition, there does appear to be a structural shift in the types of industries that dominate establishment counts (e.g. ornamental architecture).
At this point, we are crucially not controlling for GDP or output. As such, recreating this analysis with size as measured by annual sales will be a critical robustness check to perform.
## Joining with `by = join_by(GEO_ID)`
Small-Medium sized establishments continue to populate familiar corridors, but in some areas, appear to be distributed across substantially different sectors than the sectors that larger establishments were distributed over (e.g. Los Angeles millwork and wood-product manufacturing). In other regions, such as the midwest, small-medium sized establishments are distributed across very similar sectors as larger establishments are. In particular, manufacturing activity in Florida seems to be consist of a large number of smaller manufacturing establishments, and few large incumbents.
## Joining with `by = join_by(GEO_ID)`
These differences suggest that certain manufacturing ecosystems might be supported by strong, deep supplier networks at the small-medium level (20-49 employees), while other manufacturing ecosystems see experimentation, specialization, and artisan manufacturing begin at this level of establishment size.
We now turn to small establishments, with 10-19 employees. Exponential growth in the distribution of establishments across geographies and sectors has leveled off. There is some evidence that the structure of these establishments is similar to the structure of small-medium establishments: highly specialized, smaller establishments.
Again, small establishments locate in familiar areas. There is a high density of small establishments in the eastern corridor, as well as in certain areas, such as Florida, and in the Bay Area of California. These patterns further support the hypothesis that certain geographies support strong, vertically organized supplier networks driven by large focal firms, and other geographies have more distributed, specialized, and bespoke manufacturing environments. Curiously, certain geographies might be defined by both categories (e.g. Los Angeles), and these typologies vary within states.
## Joining with `by = join_by(GEO_ID)`
## Joining with `by = join_by(GEO_ID)`
The penultimate category is manufacturing establishments with between 5 to 10 employees. There appears to be another step increase in the geographic distribution of very small manufacturing establishments. In addition, there appears to be yet another structural shift in the type of sectors that these establishments are distributed over, with many machine shops.
The geographic distribution of very small establishments follows that of small establishments. The sectoral overlap is unclear
## Joining with `by = join_by(GEO_ID)`
## Joining with `by = join_by(GEO_ID)`
We conclude with tiny establishments: establishments with 5 or fewer employees. These are either highly automated manufacturing establishments, or highly artistically manufacturing establishments. We again see a potentially exponential increase in the number of counties that tiny establishments are distributed over.
These establishments are distributed truly all over the country. Curiously, some of the counties with very large manufacturers do not have these types of establishments.
## Joining with `by = join_by(GEO_ID)`
## Joining with `by = join_by(GEO_ID)`